Data Wrangling

Inspiration: https://gradientdescending.com/survivor-data-from-the-tv-series-in-r/

Survivors Location

Castaways - Exploring Players Stats

Where do players come from?

# tm_shape(usmapdata::us_map) +
#   tm_fill() +
#   tm_borders()

Who played the most

participations_count <- castaways %>% 
  group_by(castaway_id, full_name) %>%
  summarise(num_participations=n()) %>% 
  arrange(desc(num_participations))
## `summarise()` has grouped output by 'castaway_id'. You can override using the
## `.groups` argument.

Memorable players - Played at least two seasons or made the jury

## `summarise()` has grouped output by 'castaway_id'. You can override using the
## `.groups` argument.
## Joining, by = c("castaway_id", "full_name")

Players who have been able to play more than once without making it to the jury

Challenges

Types of challenges

within(challenge_description, rm(challenge_id, challenge_name)) %>% 
  summarise_each( funs = mean) %>% 
  sapply(round, 3) * 100
## Warning: `summarise_each_()` was deprecated in dplyr 0.7.0.
## Please use `across()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
##     puzzle       race  precision  endurance   strength turn_based    balance 
##       26.9       81.4       20.8       13.0        5.6       14.9       16.1 
##       food  knowledge     memory       fire      water 
##        2.6        6.2        2.4        3.7       21.8

Winners -

Winners confessionals vs rest of people

## `summarise()` has grouped output by 'season_name', 'castaway'. You can override
## using the `.groups` argument.

Winners Personality Types (vs norm)

Il y a 16 types de personnalités. On s’attend à ce que le nombre de personnes dans chaque classe représente autour de 1/16 ie 6.25%

## # A tibble: 2 × 2
##   is_introvert count
##   <lgl>        <int>
## 1 FALSE           26
## 2 TRUE            15
## # A tibble: 3 × 2
##   is_introvert count
##   <lgl>        <int>
## 1 FALSE          423
## 2 TRUE           336
## 3 NA              21

Winners votes out

Le nombre de votes n’est pas un bon prédicteur pour déterminer le gagnant

Immunity Idols

plot_lm(castaways$day, castaways$immunity_idols_won)

castaways$is_winner = ifelse(castaways$result == 'Sole Survivor', TRUE, FALSE)

plot(castaways$day, castaways$immunity_idols_won, col = ifelse(castaways$result == 'Sole Survivor', "green", "black"))

# Show Rating -

Has the show popularity decline/quality? (sort by color)

p <- viewers %>% 
  ggplot(aes(x=episode_date, y=viewers)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("Viewers (millions") + 
    xlab("Date") +
    theme_ipsum() 

ggplotly(p)
## Warning: Removed 22 rows containing missing values (position_stack).

IMDB Rating

Jury